Methods and compositions for improved cattle longevity and milk production

ABSTRACT

A single nucleotide polymorphic site at position 10793 of the bovine POU1F1 gene is associated with improved longevity and milk product traits. Disclosed are nucleic acid molecules, kits, methods of genotyping and marker assisted bovine breeding methods.

GOVERNMENT INTEREST

This invention was made with United States government support awarded by the following agencies: USDA/CSREES 05-CRHF-0-6055. The United States government has certain rights in this invention.

FIELD OF THE INVENTION

The present invention relates to a method of cattle progeny testing using molecular genetic methods by assaying for the presence of at least one genetic marker which is indicative of longevity and improved milk production.

BACKGROUND OF THE INVENTION

Dairy cows are significant investments for dairy farmers, and enormous efforts, such as animal breeding and artificial insemination, have been and continue to be invested in ensuring that the animals have high and sustained productivity, and that the milk produced are of high quality. Traditional breeding techniques involve the studying of sire progenies, and evaluating their traits including milk production ratings (transmitting abilities) to guide further breeding. This standard technique is time consuming and costly, requiring years to evaluate the true genetic value by progeny testing each bull. Many cows must be bred and give birth to offspring. The females must be raised, bred, allowed to give birth and finally milked for a length of time to measure their phenotypic traits.

Furthermore, selection based purely on phenotypic characteristics does not efficiently take into account genetic variability caused by complex gene action and interactions, and the effect of the environmental and developmental variants. There is thus a need for a method of genetically evaluating cattle to enable breeders to more accurately select animals at both the phenotypic and the genetic level.

Marker-assisted selection can lower the high cost of progeny testing currently used to improve sires, since young bull progeny could be evaluated immediately after birth, and young bulls that are determined by genetic testing to have undesirable markers would never be progeny tested. Testing may even be conducted prior to birth, for the presence/absence of the marker. Therefore, there is also a need for genetic markers for improved milk production traits.

POU1F1 is a member of the tissue specific POU (Pit, Oct, Unc) homeobox transcription factor DNAbinding protein family that is found in all mammals studied so far (Bastos et al., 2006; Ingraham et al., 1988; Ingraham et al., 1990). The pituitary specific expression of POU1F1 is required for the activation of growth hormone (GH), prolactin (PRL), and thyroid stimulating hormone (TSH) (Li et al., 1990). These genes are involved in a variety of signaling pathways that are important for many developmental and physiological processes, including pituitary gland development (Li et al., 1990, Mullis, 2007), mammary gland development and growth (Svennersten-Sjaunja and Olsson, 2005), milk protein expression (Akers, 2006), and milk production and secretion (Svennersten-Sjaunja and Olsson, 2005). Moreover, binding of GH and PRL to their receptors on the cell membrane triggers a cascade of signaling events including the JAK/STAT pathway, which has been shown to be required for adult mammary gland development and lactogenesis (Liu et al., 1997).

Mutations in POU1F1 often result in severe GH deficiency as well as defects in development (Mullis, 2007). In a dwarf mouse model, mutations in POU1F1 lead to the loss of three pituitary cell types—somatotropes, lactotropes and thyrotropes—(Li et al., 1990). Lactotropes produce prolactin, which is necessary for mammary gland development and lactation.

Several genes in the same pathway of POU1F1 have been reported to be associated with different milk production and health traits. For example, growth hormone receptor (GHR) and prolactin receptor (PRLR) have shown associations with milk yield and composition (Viitala et al., 2006). Also, the signal transducer and activator of transcription 1 (STAT1) and osteopontin (OPN) genes have been shown to have significant effects on milk yield and milk protein and fat yields in Holstein dairy cattle (Cobanoglu et al., 2006; Leonard et al., 2005; Schnabel et al., 2005). The uterine milk protein (UTMP) is another gene in the pathway of POU1F1 that has been found to be associated with productive life in dairy cattle (Khatib et al., 2007b).

POU1F1 is located on bovine chromosome region BTA1q21-22 (Woollard et al., 2000), where multiple quantitative trait loci (QTL) affecting milk production traits have been identified (Georges et al., 1995; Nadesalingam et al., 2001). In previous studies, POU1F1 variants have been reported to be associated with milk yield and conformation traits (Renaville et al., 1997; Tuggle and Freeman, 1994). Taken together, the biological functions of POU1F1 and associations with production traits of genes in the same pathway of POU1F1 suggest that this gene could be functionally involved in milk yield and composition traits.

SUMMARY OF THE INVENTION

The present inventors investigated the effects of POU1F1 on health and milk composition traits in two independent North American Holstein cattle populations. A pooled DNA sequencing approach was used to identify single nucleotide polymorphisms (SNP) in the gene. A SNP(C/A) in exon 3 of POU1F1 that changes a proline to a histidine was identified. A total of 2141 individuals from two independent North American Holstein cattle populations were genotyped for this SNP using a modified PCR-RFLP method. The frequencies of allele A were 14.9% and 16.8% in the two examined populations respectively. Statistical analysis revealed significant association of POU1F1 variants with milk yield and productive life, which makes POU1F1a strong candidate for marker assisted selection in dairy cattle breeding programs.

Based on the results, the present invention provides an isolated nucleic acid molecule comprising a polymorphic site of position 10793 (“SNP 10793”) of SEQ ID NO: 1 and at least 17 contiguous nucleotides or bases of SEQ ID NO: 1 adjacent to the polymorphic site, wherein the nucleic acid molecule comprises an adenine base at position 10793 of SEQ ID NO: 1. It is recognized that SEQ ID NO: 1 is already known, and the nucleic acid molecule therefore does not encompass one that consists of SEQ ID NO: 1.

Preferably, the nucleic acid molecule which comprises at least 15, more preferably at least 20, still more preferably at least 25, contiguous bases of SEQ ID NO: 1 adjacent to the polymorphic site. In one embodiment, the isolated nucleic acid molecule comprises not more than 1,500 nt, preferably not more than 1000 nt, more preferably not more than 900 nt, more preferably not more than 800 nt, more preferably not more than 700 nt, preferably not more than 600 nt, more preferably not more than 500 nt, preferably not more than 400 nt, more preferably not more than 300 nt, more preferably not more than 150 nt., preferably not more than 100 nt., still more preferably not more than 50 nt.

The nucleic acid molecule preferably contains the polymorphic site which is within 4 nucleotides of the center of the nucleic acid molecule. Preferably, the polymorphic site is at the center of the nucleic acid molecule.

In another embodiment, the nucleic acid molecule contains the polymorphic site which is at the 3′-end of the nucleic acid molecule.

The present invention also provides an array of nucleic acid molecules comprising at least two nucleic acid molecules described above.

The present invention further provides a kit comprising a nucleic acid molecule described above, and a suitable container.

Also provided is a method for detecting single nucleotide polymorphism (SNP) in bovine POU1F1 gene, wherein the POU1F1 gene has a nucleic acid sequence of SEQ ID NO: 1, the method comprising determining the identity of a nucleotide at position 10793, and comparing the identity to the nucleotide identity at a corresponding position of SEQ ID NO: 1.

In another embodiment, the present invention provides a method for genotyping a bovine cell, using the method above. Suitable bovine cell may be an adult cell, an embryo cell, a sperm, an egg, a fertilized egg, or a zygote. The identity of the nucleotide may be determined by sequencing the POU1F 1 gene, or a relevant fragment thereof, isolated from the cell. The POU1F1gene or a relevant fragment thereof is isolated from the cell via amplification by the polymerase chain reaction (PCR) of genomic DNA of the cell, or by RT-PCR of the mRNA of the cell. Preferably, the PCR or RT-PCR is conducted with a pair of primers having the following sequences:

(SEQ ID NO: 2) CAAATGGTCCTTTTCTTGTTGTTACAGGGAGCTTAAGGC (SEQ ID NO: 3) CTTTAAACTCATTGGCAAACTTTTC.

In a further embodiment, the present invention provides a method for progeny testing of cattle, the method comprising collecting a nucleic acid sample from the progeny, and genotyping said nucleic sample as described above.

Further provided is a method for selectively breeding cattle using a multiple ovulation and embryo transfer procedure (MOET), the method comprising superovulating a female animal, collecting eggs from said superovulated female, in vitro fertilizing said eggs from a suitable male animal, implanting said fertilized eggs into other females allowing for an embryo to develop, genotyping the developing embryo, and terminating pregnancy if the developing embryo does not have adenine (A) at position 10793. Preferably, pregnancy is terminated if the embryo is homozygously A at position 10793.

In a preferred embodiment, the method is used for selectively breeding dairy cattles, comprising selecting a bull that is hemizygously or homozygously A at position 10793 of its POU1F1 gene, and using its semen for fertilizing a female animal. Preferably the bull is homozygously A at position 10793. More preferably, the female animal is also hemizygously or homozygously A at position 107931, preferably homozygously A. MOET procedure may be preferably used for the selective breeding.

The present invention also provides a method for testing a dairy cattle for longevity or its milk production trait, or both, comprising genotyping its cells, wherein a cattle being homozygously A at position 107931 indicates that the cattle has desirable longevity or milk production trait.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the POU1F1 gene sequence (SEQ ID NO: 1) where the relevant polymorphic site is shown.

FIG. 2 shows the protein sequence alignment of POU1F1 from mammalian species. Protein sequences of POU1F1 from mouse, rat, human, chimpanzee, bovine, and dog were aligned using the multiple alignment algorithm ClustalW and visualized with Jalview (EBI). Numbers on the top are the relative positions of amino acids. The position of the Pro76H is mutation is indicated by the arrow.

DETAILED DESCRIPTION OF THE INVENTION

It has been found that a specific site, i.e. position 10793 (see FIG. 1), in the POU1F1 gene sequence is polymorphic. The term “polymorphism” as used herein refers to the occurrence of two or more alternative genomic sequences or alleles between or among different genomes or individuals. “Polymorphic” refers to the condition in which two or more variants of a specific genomic sequence can be found in a population. A “polymorphic site” is the locus at which the variation occurs. Polymorphisms generally have at least two alleles, each occurring at a significant frequency in a selected population. A polymorphic locus may be as small as one base pair. The first identified allelic form is arbitrarily designated as the reference form, and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the wild type form. Diploid organisms may be homozygous or heterozygous for allelic forms. A biallelic polymorphism has two forms, and a triallelic polymorphism has three forms, and so on.

Polymorphisms may result in functional differences, through changes in the encoded polypeptide, changes in mRNA stability, binding of transcriptional and translation factors to the DNA or RNA, and the like. Polymorphisms are often used to detect genetic linkage to phenotypic variation.

One type of polymorphism, single nucleotide polymorphisms (SNPs), has gained wide use for the detection of genetic linkage recently. SNPs are generally biallelic systems, that is, there are two alleles that an individual may have for any particular SNP marker. In the instant case, SNPs are used for determining the genotypes of the POU1F1 gene, which are found to have strong correlation to longevity and milk production traits.

In the context of the present specification, the provided sequences also encompass the complementary sequence, including those corresponding to the provided polymorphisms. In order to provide an unambiguous identification of the specific polymorphic site the numbering of the original POU1F1 sequence in the GenBank is shown in FIG. 1 and is used.

The present invention provides nucleic acid based genetic markers for identifying bovine animals with superior longevity and milk production traits. In general, for use as markers, nucleic acid fragments, preferably DNA fragments, will be of at least 12 nucleotides (nt), preferably at least 15 nt, usually at least 20 nt, often at least 50 nt. Such small DNA fragments are useful as primers for the polymerase chain reaction (PCR), and probes for hybridization screening, etc.

The term primer refers to a single-stranded oligonucleotide capable of acting as a point of initiation of template-directed DNA synthesis under appropriate conditions (i.e., in the presence of four different nucleoside triphosphates and an agent for polymerization, such as, DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact sequence of the template but must be sufficiently complementary to hybridize with a template. The term primer site, or priming site, refers to the area of the target DNA to which a primer hybridizes. The term primer pair means a set of primers including a 5′ upstream primer that hybridizes with the 5′ end of the DNA sequence to be amplified and a 3′, downstream primer that hybridizes with the complement of the 3′ end of the sequence to be amplified.

The term “probe” or “hybridization probe” denotes a defined nucleic acid segment (or nucleotide analog segment) which can be used to identify by hybridization a specific polynucleotide sequence present in samples, said nucleic acid segment comprising a nucleotide sequence complementary of the specific polynucleotide sequence to be identified. “Probes” or “hybridization probes” are nucleic acids capable of binding in a base-specific manner to a complementary strand of nucleic acid.

An objective of the present invention is to determine which embodiment of the polymorphisms a specific sample of DNA has. For example, it is desirable to determine whether the nucleotide at a particular position is A or C. An oligonucleotide probe can be used for such purpose. Preferably, the oligonucleotide probe will have a detectable label, and contains an A at the corresponding position. Experimental conditions can be chosen such that if the sample DNA contains an A at the polymorphic site, a hybridization signal can be detected because the probe hybridizes to the corresponding complementary DNA strand in the sample, while if the sample DNA contains a G, no hybridization signal is detected.

Similarly, PCR primers and conditions can be devised, whereby the oligonucleotide is used as one of the PCR primers, for analyzing nucleic acids for the presence of a specific sequence. These may be direct amplification of the genomic DNA, or RT-PCR amplification of the mRNA transcript of the POU1F1 gene. The use of the polymerase chain reaction is described in Saiki et al. (1985) Science 230:1350-1354. Amplification may be used to determine whether a polymorphism is present, by using a primer that is specific for the polymorphism. Alternatively, various methods are known in the art that utilize oligonucleotide ligation as a means of detecting polymorphisms, for examples see Riley et al (1990) Nucleic Acids Res. 18:2887-2890; and Delahunty et al (1996) Am. J. Hum. Genet. 58:1239-1246. The detection method may also be based on direct DNA sequencing, or hybridization, or a combination thereof. Where large amounts of DNA are available, genomic DNA is used directly. Alternatively, the region of interest is cloned into a suitable vector and grown in sufficient quantity for analysis. The nucleic acid may be amplified by PCR, to provide sufficient amounts for analysis.

Hybridization may be performed in solution, or such hybridization may be performed when either the oligonucleotide probe or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking baking, etc. Oligonucleotides may be synthesized directly on the solid support or attached to the solid support subsequent to synthesis. Solid-supports suitable for use in detection methods of the invention include substrates made of silicon, glass, plastic, paper and the like, which may be formed, for example, into wells (as in 96-well plates), slides, sheets, membranes, fibers, chips, dishes, and beads. The solid support may be treated, coated or derivatized to facilitate the immobilization of the allele-specific oligonucleotide or target nucleic acid. For screening purposes, hybridization probes of the polymorphic sequences may be used where both forms are present, either in separate reactions, spatially separated on a solid phase matrix, or labeled such that they can be distinguished from each other.

Hybridization may also be performed with nucleic acid arrays and subarrays such as described in WO 95/11995. The arrays would contain a battery of allele-specific oligonucleotides representing each of the polymorphic sites. One or both polymorphic forms may be present in the array, for example the polymorphism of position 10793 may be represented by either, or both, of the listed nucleotides. Usually such an array will include at least 2 different polymorphic sequences, i.e. polymorphisms located at unique positions within the locus, and may include all of the provided polymorphisms. Arrays of interest may further comprise sequences, including polymorphisms, of other genetic sequences, particularly other sequences of interest. The oligonucleotide sequence on the array will usually be at least about 12 nt in length, may be the length of the provided polymorphic sequences, or may extend into the flanking regions to generate fragments of 100 to 200 nt in length. For examples of arrays, see Ramsay (1998) Nat. Biotech. 16:4044; Hacia et al.

(1996) Nature Genetics 14:441-447; Lockhart et al. (1996) Nature Biotechnol. 14:1675-1680; and De Risi et al. (1996) Nature Genetics 14:457-460.

The identity of polymorphisms may also be determined using a mismatch detection technique, including but not limited to the RNase protection method using riboprobes (Winter et al., Proc. Natl. Acad. Sci. USA 82:7575, 1985; Meyers et al., Science 230:1242, 1985) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich, P. Ann. Rev. Genet. 25:229-253, 1991). Alternatively, variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et al., Genomics 5:874-879, 1989; Humphries et al., in Molecular Diagnosis of Genetic Diseases, R. Elles, ed., pp. 321-340, 1996) or denaturing gradient gel electrophoresis (DGGE) (Waitell et al., Nucl. Acids Res. 18:2699-2706, 1990; Sheffield et al., Proc. Natl. Acad. Sci. USA 86:232-236, 1989).

A polymerase-mediated primer extension method may also be used to identify the polymorphism(s). Several such methods have been described in the patent and scientific literature and include the “Genetic Bit Analysis” method (WO92/15712) and the ligase/polymerase mediated genetic bit analysis (U.S. Pat. No. 5,679,524). Related methods are disclosed in WO91/02087, WO90/09455, WO95/17676, U.S. Pat. Nos. 5,302,509, and 5,945,283. Extended primers containing a polymorphism may be detected by mass spectrometry as described in U.S. Pat. No. 5,605,798. Another primer extension method is allele-specific PCR (Ruao et al., Nucl. Acids Res. 17:8392, 1989; Ruao et al., Nucl. Acids Res. 19, 6877-6882, 1991; WO 93/22456; Turki et al., J. Clin. Invest. 95:1635-1641, 1995). In addition, multiple polymorphic sites may be investigated by simultaneously amplifying multiple regions of the nucleic acid using sets of allele-specific primers as described in Wallace et al. (WO 89/10414).

A detectable label may be included in an amplification reaction. Suitable labels include fluorochromes, e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 2′,7′-dimethoxy-4′,5′-dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX), 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein (5-FAM) or N,N,N′,N′-tetramethyl-6-carboxyrhodamine (TAMRA), radioactive labels, e.g. ³²P, ³⁵S, ³H; etc. The label may be a two stage system, where the amplified DNA is conjugated to biotin, haptens, etc. having a high affinity binding partner, e.g. avidin, specific antibodies, etc., where the binding partner is conjugated to a detectable label. The label may be conjugated to one or both of the primers. Alternatively, the pool of nucleotides used in the amplification is labeled, so as to incorporate the label into the amplification product.

It is readily recognized by those ordinarily skilled in the art that in order to maximize the signal to noise ratio, in probe hybridization detection procedure, the polymorphic site should be at the center of the probe fragment used, whereby a mismatch has a maximum effect destabilizing the hybrid molecule; and in a PCR detection procedure, the polymorphic site should be placed at the very 3′-end of the primer, whereby a mismatch has the maximum effect on preventing a chain elongation reaction by the DNA polymerase. The location of nucleotides in a polynucleotide with respect to the center of the polynucleotide are described herein in the following manner. When a polynucleotide has an odd number of nucleotides, the nucleotide at an equal distance from the 3′ and 5′ ends of the polynucleotide is considered to be “at the center” of the polynucleotide, and any nucleotide immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is considered to be “within 1 nucleotide of the center.” With an odd number of nucleotides in a polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even number of nucleotides, there would be a bond and not a nucleotide at the center of the polynucleotide. Thus, either of the two central nucleotides would be considered to be “within 1 nucleotide of the center” and any of the four nucleotides in the middle of the polynucleotide would be considered to be “within 2 nucleotides of the center,” and so on.

In some embodiments, a composition contains two or more differently labeled oligonucleotides for simultaneously probing the identity of nucleotides or nucleotide pairs at two or more polymorphic sites. It is also contemplated that primer compositions may contain two or more sets of allele-specific primer pairs to allow simultaneous targeting and amplification of two or more regions containing a polymorphic site.

Alternatively, the relevant portion of the POU1F1 gene of the sample of interest may be amplified via PCR and directly sequenced, and the sequence be compared to the wild type sequence shown in FIG. 1. It is readily recognized that, other than those disclosed specifically herein, numerous primers can be devised to achieve the objectives. PCR and sequencing techniques are well known in the art and reagents and equipments are readily available commercially.

DNA markers have several advantages; segregation is easy to measure and is unambiguous, and DNA markers are co-dominant, i.e., heterozygous and homozygous animals can be distinctively identified. Once a marker system is established selection decisions could be made very easily, since DNA markers can be assayed any time after a blood sample can be collected from the individual infant animal, or even earlier by testing embryos in vitro if very early embryos are collected. The use of marker assisted genetic selection will greatly facilitate and speed up cattle breeding problems. For example, a modification of the multiple ovulation and embryo transfer (MOET) procedure can be used with genetic marker technology. Specifically, females are superovulated, eggs are collected, in vitro fertilized using semen from superior males and implanted into other females allowing for use of the superior genetics of the female (as well as the male) without having to wait for her to give birth to one calf at a time. Developing blastomeres at the 4-8 cell stage may be assayed for presence of the marker, and selection decisions made accordingly.

In one embodiment of the invention an assay is provided for detection of presence of a desirable genotype using the markers.

The term “genotype” as used herein refers to the identity of the alleles present in an individual or a sample. In the context of the present invention a genotype preferably refers to the description of the polymorphic alleles present in an individual or a sample. The term “genotyping” a sample or an individual for a polymorphic marker refers to determining the specific allele or the specific nucleotide carried by an individual at a polymorphic marker.

The present invention is suitable for identifying a bovine, including a young or adult bovine animal, an embryo, a semen sample, an egg, a fertilized egg, or a zygote, or other cell or tissue sample therefrom, to determine whether said bovine possesses the desired genotypes of the present invention, some of which are indicative of improved milk production traits.

Further provided is a method for genotyping the bovine POU1F1 gene, comprising determining for the two copies of the POU1F1 gene present the identity of the nucleotide pair at position 10793.

One embodiment of a genotyping method of the invention involves examining both copies of the POU1F1 gene, or a fragment thereof, to identify the nucleotide pair at the polymorphic site in the two copies to assign a genotype to the individual. In some embodiments, “examining a gene” may include examining one or more of: DNA containing the gene, mRNA transcripts thereof, or cDNA copies thereof. As will be readily understood by the skilled artisan, the two “copies” of a gene, mRNA or cDNA, or fragment thereof in an individual may be the same allele or may be different alleles. In another embodiment, a genotyping method of the invention comprises determining the identity of the nucleotide pair at the polymorphic site.

The present invention further provides a kit for genotyping a bovine sample, the kit comprising in a container a nucleic acid molecule, as described above, designed for detecting the polymorphism, and optionally at least another component for carrying out such detection. Preferably, a kit comprises at least two oligonucleotides packaged in the same or separate containers. The kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container. Alternatively, where the oligonucleotides are to be used to amplify a target region, the kit may contain, preferably packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as PCR.

In one embodiment the present invention provides a breeding method whereby genotyping as described above is conducted on bovine embryos, and based on the results, certain cattle are either selected or dropped out of the breeding program.

Through use of the linked marker loci, procedures termed “marker assisted selection” (MAS) may be used for genetic improvement within a breeding nucleus; or “marker assisted introgression” for transferring useful alleles from a resource population to a breeding nucleus (Soller 1990; Soller 1994).

The present invention discloses the association between POU1F1 and milk production and longevity in a total of 2141 individuals from two independent Holstein dairy cattle populations. SNP10793 allele A was associated with a significant increase in PTA for milk yield in the CDDR granddaughter design population but not in the daughter design UW resource population. Although the granddaughter design has more power than the daughter design for detecting QTL (Weller et al., 1990), the use of PTA values in the CDDR population may limit the detection of epistasis and dominance effects. Thus, to test whether there is any genetic interaction between the A and C alleles of SNP10793, genotypic effects were estimated in the UW population using the YD data. Genotype AA was found to be associated with an increase in milk yield and productive life compared to the CC and AC genotypes. This is an indication for complete dominance of the C allele over the A allele in determining the phenotypic value of productive life and milk yield. Also, this could explain the lack of significant association between POU1F1 and productive life when PTAs were used in the allele substitution model.

A different SNP located in exon 6 of POU1F1 has been reported to be associated with milk yield (Tuggle and Freeman, 1994; Renaville et al., 1997). The population size used in those studies was relatively small (115 and 98, respectively) compared to a total of 2141 individuals from two independent populations investigated in the current study. In addition, the SNP reported in the current study is a missense mutation compared to a synonymous mutation SNP reported in Tuggle and Freeman (1994) and Renaville et al. (1997).

The candidate gene approach has been widely and successfully used in medical and agricultural studies to identify underlying genes responsible for complex traits such as susceptibility to diseases and production traits. We have used this approach and identified a number of genes including OLR1 (Khatib et al., 2006, Khatib et al., 2007a), PI (Khatib et al., 2005), OPN (Leonard et al., 2005), STAT1 (Cobanoglu et al., 2006), and UTMP (Khatib et al., 2007b) that are associated with milk production and health traits. In addition to the functional candidate gene approach, positional information about the investigated gene is usually incorporated into this approach to identify candidate genes. However, production traits are very complex by nature and determined by multiple factors including single gene effects, interaction between genes, and environmental factors. Therefore, in addition to positional and functional information of the single gene, functional information of the signaling pathway and regulatory network in which the candidate gene is involved should be incorporated to aid the identification of candidate genes. In light of this notion, POU1F1 was chosen as a candidate gene for milk production traits. First, POU1F1 is a transcription factor that controls the expression of GH and PRL, two important genes in mammary gland development and milk production and secretion. Second, genes that are downstream of the POU1F1 signaling pathway (e.g. STAT1, OPN, UTMP) have been reported to be associated milk production and health traits. Third, the amino acid proline is highly conserved among mammalian species; as such mutations at this position could change the function of the protein.

In summary, based on the positional, functional, and regulatory information, POU1F1 was chosen as a candidate gene for investigation of association with milk production and health traits. We identified SNP10793, a C to A nucleotide change that changes a proline to a histidine in the protein. The rarer AA genotype was associated with a significant increase in productive life and milk yield. These results suggest that POU1F1 could be used in marker assisted selection programs in dairy cattle.

The following examples are intended to illustrate preferred embodiments of the invention and should not be interpreted to limit the scope of the invention as defined in the claims.

EXAMPLES Materials and Methods

Cattle Population and Phenotypic Data

Semen samples from 31 Holstein sires and their 1299 sons were obtained from the Cooperative Dairy DNA Repository (CDDR) of the USDA Bovine Functional Genomics Laboratory (Beltsville, Md.). Blood samples (n=842) were obtained from the University of Wisconsin (UW) resource population (Gonda et al., 2006; Cobanoglu et al., 2006; Khatib et al., 2007b). Phenotypic data including predicted transmitting abilities (PTAs) and yield deviations (YDs) for milk yield, fat yield, protein yield, productive life, and SCS score were obtained from the USDA Animal Improvement Program Laboratory (Beltsville, Md.).

Single Nucleotide Polymorphism (SNP) Identification

SNP were identified in POU1F1 using the pooled DNA sequencing approach as described in Leonard et al. (2005). Briefly, genomic DNA was extracted form 30 individuals, quantified using a spectrophotometer, then equal amounts of DNA from each individual were pooled together and subjected to PCR amplification using different pairs of primers designed in POU1F1. PCR products were sequenced using forward and reverse primers and SNP were identified by visual inspection of the chromatograms. To validate SNP identified in the pools, individuals composing these pools were also sequenced.

SNP Genotyping

Genotyping of the identified SNP was done by a PCR-restriction fragment length polymorphism (PCR-RFLP) based method. To genotype SNP3699, primers (forward: atactcatcagagaactgcc and reverse: cattaaccctgttggtatgg) were used to amplify a 771 bp genomic fragment of POU1F1. The PCR products were digested with the restriction enzyme TaqI. Depending on the availability of restriction enzymes and suitability of the sequence, a PCR primer can be designed to change the nucleotide sequence near the SNP to create a restriction site. For SNP10793, primers (forward: caaatggtccttttcttgttgttacagggagcttaaggc and reverse: ctttaaactcattggcaaacttttc) were designed to amplify a PCR product of 234 bp. Two Cs were mutated to Gs at positions 2 and 3 nucleotides upstream of the SNP in order to create a recognition site for the restriction enzyme StuI.

A touchdown PCR program was used as follows: initial denaturing at 94° C. for 5 min, followed by 33 cycles of 94° C. for 45 s, touchdown annealing for 45 s (from 63° C. to 50° C., stay at 50° C. for 25 cycles), and 72° C. for 45 s, and final extension at 72° C. for 7 min. The PCR products were subjected to TaqI or StuI (Promega, Madison, Wis.) digestion according to manufacturer's instructions, followed by 2% agarose gel electrophoresis. The A allele of SNP3699 was indicted by two bands of 338 and 433 bp, and the G allele was indicated by three bands of 84, 254, and 433 bp. For SNP10793, the C allele was indicated by two bands of 38 and 198 bp, while the A allele was indicted by a single band of 234 bp.

Statistical Analysis

For the CDDR population, association analysis between number of alleles at the POU1F1 locus and productive traits was carried out through a weighted least square allele substitution model of the following form:

y _(ij)=μ+sire_(i) +βx _(ij)+ε_(ij)

where y_(ij) is the PTAs of the trait considered, vi represents a general constant, sire_(i) is the fixed effect of the i^(th) sire, β represents half of the allele substitution effect (α/2), x_(ij) is the number of A alleles (0, 1, 2) at SNP3699 or SNP10793, and ε_(ij) is the residual term.

For the UW resource population, association of POU1F1 polymorphism with production traits was evaluated with the following mixed effect model:

y _(ijklm) =μ+h _(i) +s _(j) +mgs _(k) +d _(ijkl) τ+p _(m)+ε_(ijklm)

where y_(ijklm) represents in turn the yield deviation for milk protein and fat or productive life of daughter l of sire j and maternal grandsire k; τ represents an effect associated with M. Paratubercolosis infectious status; d_(ijkl) is an indicator variable assuming values 0 or 1 for non infected and infected cows respectively; p_(m) represents the effect of POU1F1 (1=AA, AG, GG). Herd h, sire s and maternal grand sire mgs effects were fitted in the model as random. In the analysis, correlation between individuals was not accounted for and therefore variance structure for sire and maternal grand sire effect had form, I{circle around (x)}σ_(s) ² and I{circle around (x)}σ² _(mgs) respectively. Variance structure for herd effect was I{circle around (x)}σ² _(h). Standard assumptions were made for the residual term ε_(ijklm). Additive genetic effect was estimated as half of the difference between the two homozygotes groups and dominant genetic effect was computed as the difference between heterozygote and the average of two homozygotes. Degree of dominance was estimated as the ratio of dominant effect over additive effect (Falconer and Mackay, 1996) with values approaching 1 indicating complete dominance (same effect for heterozygote and homozygote). All statistical analyses procedures were implemented using “lm” and “lme” of the freely and publicly available R software v. 2.5.1.

Example 1 Identification of SNP

Sequencing of 30 grandsires from the CDDR populations and of the pooled DNA samples revealed four SNP identified in protein coding exons of POU1F1. An A/G SNP at position 3699 (GenBank accession number NW 001501776) was identified in exon 2, and 3 SNP were identified in exon 3: SNP A/C at position 10793, SNP C/T at position 10822, and SNP A/G at position 10863. Importantly, SNP3699 (exon 2), SNP10822 (exon 3), and SNP10863 (exon 3) were found to be in complete linkage disequilibrium (LD), therefore only SNP3699 was used for genotyping. SNP10793 (exon 3) was not in LD with other SNPs, therefore it was genotyped independently. SNP3699, SNP10822, and SNP10863 are synonymous mutations whereas SNP10793 is a missense mutation in which the change from a C to an A (minor allele) changes amino acid 76 of the POU1F1 protein from proline (Pro) to histidine (His). Alignment of protein sequences of POU1F1 from mouse, rat, human, chimpanzee, bovine, and dog using the multiple alignment algorithm ClustalW, revealed that proline is highly conserved among these species (FIG. 2).

Example 2 Association of POU1F1 with Milk Production and Health Traits

The allele and genotype frequencies of SNP3699 and SNP10793 and the corresponding chi square test of Hardy-Weinberg equilibrium (HWE) are listed in Table 1. The genotype frequencies were consistent with those expected of a population in Hardy-Weinberg equilibrium. Association testing of SNP3699 in the CDDR population did not show significance with any of the examined traits (data not shown), so this SNP was not investigated in the UW resource population.

In contrast to SNP3699, SNP10793 was found to be significantly associated (P=0.027) with milk yield in the CDDR population using the allele substitution model (Table 2). PTAs analysis in the UW population did not detect association of milk yield with POU1F1 locus. However, in the YD analysis, AA genotype was found to be associated with higher milk yield (Table 3). The AA genotype was also found to be positively associated with productive life. Because of PTAs additivity assumptions, dominance effects were estimated in the UW population using yield deviation data (Table 3). For both, yield and productive life, the ratio of dominant effect over additive was close to 1, suggesting complete dominance. Frequencies of the allele A of SNP10793 were 15% and 17% in the CDDR and UW populations, respectively (Table 1).

TABLE 1 Allele and genotype frequencies and tests HWE of the identified SNPs of POU1F1 in the CDDR and UW resource Holstein cattle populations Population MAF^(a) Genotype X ²(DF = 1)^(b) CDDR SNP3699 0.18 n(AA) = 12, n(AG) = 149, 1.228 (n = 480)^(c) n(GG) = 319 SNP10793 0.15 n(AA) = 31, n(AC) = 325, 0.227 (n = 1299) n(CC) = 943 UW SNP10793 0.17 n(AA) = 26, n(AC) = 231, 0.300 (n = 842) n(CC) = 585 ^(a)MAF: minor allele frequency, A allele in SNP3699 and A allele in SNP10793. ^(b)For degree of freedom of 1, the 5% significance level for X ² is 3.84. ^(c)n = number of individuals genotyped

TABLE 2 Estimates of the allele substitution effects and standard errors (SE) of SNP10793^(a) for production trait PTA values in the CDDR and UW Holstein cattle populations CDDR UW Traits α/2 ± SE α/2 ± SE Milk yield  72.24 ± 32.64* 37.11 ± 35.02 Fat yield 1.69 ± 1.23 1.59 ± 1.34 Fat percentage −0.362 ± 0.500   0.052 ± 0.482 Protein yield  1.22 ± 0.835 0.792 ± 0.936 Protein percentage −0.353 ± 0.229   −0.101 ± 0.231   Productive life −0.520 ± 0.684   0.028 ± 0.047 ^(a)The effect of substituting allele C with allele A. *P < 0.05.

TABLE 3 Estimates of the genotypic effects, standard errors (SE), and additive and dominance effects of SNP10793 in the UW resource Holstein cattle population Effect Milk yield P value Productive life P value Genotype effect^(a) AC  −13.26 ± 150.88   0.9301 0.46 ± 0.87 0.5973 AA 722.55 ± 378.80 0.0592 5.17 ± 2.25 0.0240 Dominance effect^(a) −374.54 ± 220.66   0.0926 −2.12 ± 1.30   0.1065 Additive effect 361.28 ± 189.40 0.0592 2.58 ± 1.12 0.0240 Degree of 1.04 0.82 dominance^(b) ^(a)Estimates of yield deviations from the UW population, the effect of genotype CC was arbitrarily set to zero ^(b)Degree of dominance was estimated as the ratio of dominance effect over additive effect

REFERENCES

-   Akers, R. M. 2006. Major advances associated with hormone and growth     factor regulation of mammary growth and lactation in dairy cows. J.     Dairy Sci. 89(4):1222-1234. -   Bastos, E., I. Santos, I. Parmentier, J. L. Castrillo, A.     Cravador, H. Guedes-Pinto, and R. Renaville. 2006. Ovis aries POU1F1     gene: cloning, characterization and polymorphism analysis. Genetica     126(3):303-314. -   Cobanoglu, O., I. Zaitoun, Y. M. Chang, G. E. Shook, and H.     Khatib. 2006. Effects of the signal transducer and activator of     transcription 1 (STAT1) gene on milk production traits in Holstein     dairy cattle. J. Dairy Sci. 89(11):4433-4437. -   Falconer, D. S, and T. F. C. Mackay. 1996. Introduction to     Quantitative Genetics. 4th ed. Addison Wesley Longman Limited,     England. -   Georges, M., D. Nielsen, M. Mackinnon, A. Mishra, R. Okimoto, A. T.     Pasquino, L. S. Sargeant, A. Sorensen, M. R. Steele, X. Zhao, and et     al. 1995. Mapping quantitative trait loci controlling milk     production in dairy cattle by exploiting progeny testing. Genetics     139(2):907-920. -   Gonda, M. G., Y. M. Chang, G. E. Shook, M. T. Collins, and B. W.     Kirkpatrick. 2006. Genetic variation of Mycobacterium avium ssp.     paratuberculosis infection in US Holsteins. J. Dairy Sci     89(5):1804-1812. -   Ingraham, H. A., R. P. Chen, H. J. Mangalam, H. P. Elsholtz, S. E.     Flynn, C. R. Lin, D. M. Simmons, L. Swanson, and M. G.     Rosenfeld. 1988. A tissue-specific transcription factor containing a     homeodomain specifies a pituitary phenotype. Cell 55(3):519-529. -   Ingraham, H. A., S. E. Flynn, J. W. Voss, V. R. Albert, M. S.     Kapiloff, L. Wilson, and M. G. Rosenfeld. 1990. The POU-specific     domain of Pit-1 is essential for sequence-specific, high affinity     DNA binding and DNA-dependent Pit-1-Pit-1 interactions. Cell     61(6):1021-1033. -   Khatib, H., E. Heifetz, and J. C. Dekkers. 2005. Association of the     protease inhibitor gene with production traits in Holstein dairy     cattle. J. Dairy Sci 88(3):1208-1213. -   Khatib, H., S. D. Leonard, V. Schutzkus, W. Luo, and Y. M.     Chang. 2006. Association of the OLR1 gene with milk composition in     Holstein dairy cattle. J. Dairy Sci. 89(5):1753-1760. -   Khatib, H., G. J. Rosa, K. Weigel, F. Schiavini, E. Santus, and A.     Bagnato. 2007a. Additional support for an association between OLR1     and milk fat traits in cattle. Anim. Genet. 38(3):308-310. -   Khatib, H., V. Schutzkus, Y. M. Chang, and G. J. Rosa. 2007b.     Pattern of expression of the uterine milk protein gene and its     association with productive life in dairy cattle. J. Dairy Sci.     90(5):2427-2433. -   Leonard, S., H. Khatib, V. Schutzkus, Y. M. Chang, and C.     Maltecca. 2005. Effects of the osteopontin gene variants on milk     production traits in dairy cattle. J. Dairy Sci. 88(11):4083-4086. -   Li, S., E. B. Crenshaw, 3rd, E. J. Rawson, D. M. Simmons, L. W.     Swanson, and M. G. Rosenfeld. 1990. Dwarf locus mutants lacking     three pituitary cell types result from mutations in the POU-domain     gene pit-1. Nature 347(6293):528-533. -   Liu, X., G. W. Robinson, K. U. Wagner, L. Garrett, A. Wynshaw-Boris,     and L. Hennighausen. 1997. Stat5a is mandatory for adult mammary     gland development and lactogenesis. Genes Dev. 11(2):179-186. -   Mullis, P. E. 2007. Genetics of growth hormone deficiency.     Endocrinol. Metab. Clin. North Am. 36(1):17-36. -   Nadesalingam, J., Y. Plante, and J. P. Gibson. 2001. Detection of     QTL for milk production on Chromosomes 1 and 6 of Holstein cattle.     Mamm. Genome 12(1):27-31. -   Renaville, R., N. Gengler, E. Vrech, A. Prandi, S. Massart, C.     Corradini, C. Bertozzi, F. Mortiaux, A. Burny, and D.     Portetelle. 1997. Pit-1 gene polymorphism, milk yield, and     conformation traits for Italian Holstein-Friesian bulls. J. Dairy     Sci. 80(12):3431-3438. -   Schnabel, R. D., J. J. Kim, M. S. Ashwell, T. S. Sonstegard, C. P.     Van Tassell, E. E. Connor, and J. F. Taylor. 2005. Fine-mapping milk     production quantitative trait loci on BTA6: analysis of the bovine     osteopontin gene. Proc. Natl. Acad. Sci. USA 102(19):6896-6901. -   Svennersten-Sjaunja, K. and K. Olsson. 2005. Endocrinology of milk     production. Domest Anim. Endocrinol 29(2):241-258. -   Tuggle, C. K. and A. E. Freeman, Inventors. 1994. Genetic marker for     improved milk production traits in cattle. Iowa State University     Research Foundation, Inc., assignee. U.S. Pat. No. 5,614,364. -   Viitala, S., J. Szyda, S. Blott, N. Schulman, M. Lidauer, A.     Maki-Tanila, M. Georges, and J. Vilkki. 2006. The role of the bovine     growth hormone receptor and prolactin receptor genes in milk, fat     and protein production in Finnish Ayrshire dairy cattle. Genetics     173(4):21.51-2164. -   Weller, J. I., Y. Kashi, and M. Soller. 1990. Power of daughter and     granddaughter designs for determining linkage between marker loci     and quantitative trait loci in dairy cattle. J. Dairy Sci     73(9):2525-2537. -   Woollard, J., C. K. Tuggle, and F. A. Ponce de Leon. 2000. Rapid     communication: localization of POU1F1 to bovine, ovine, and caprine     1q21-22. J. Anim. Sci 78(1):242-243. 

1. An isolated nucleic acid molecule comprising a polymorphic site at position 10793 and at least 15 contiguous bases of SEQ ID NO: 1 adjacent to the polymorphic site, wherein the nucleic acid molecule comprises an adenine base at position 10793, or a nucleic acid molecule that is fully complementary to the nucleic acid molecule.
 2. A nucleic acid molecule according to claim 1, which comprises at least 17 contiguous bases of SEQ ID NO: 1 adjacent to the polymorphic site.
 3. A nucleic acid molecule according to claim 1, which comprises at least 20 contiguous bases of SEQ ID NO: 1 adjacent to the polymorphic site.
 4. An isolated nucleic acid molecule according to claim 1, which comprises not more than 150 nt.
 5. An isolated nucleic acid molecule according to claim 1, which comprises not more than 100 nt.
 6. An isolated nucleic acid molecule according to claim 1, which comprises not more than 50 nt.
 7. A nucleic acid molecule according to claim 1, wherein the polymorphic site is within 4 nucleotides of the center of the nucleic acid molecule.
 8. A nucleic acid molecule according to claim 7, wherein the polymorphic site is at the center of the nucleic acid molecule.
 9. A nucleic acid molecule according to claim 1, wherein the polymorphic site is at the 3′-end of the nucleic acid molecule.
 10. An array of nucleic acid molecules comprising at least two nucleic acid molecules according to claim
 8. 11. A kit comprising a nucleic acid molecule of claim 1, and a suitable container.
 12. A method for detecting single nucleotide polymorphism (SNP) in bovine POU1F1 gene, wherein the PI gene have a nucleic acid sequence of SEQ ID NO: 1, the method comprising determining the identity of a nucleotide at position 10793, and comparing the identity to the nucleotide identity at a corresponding position of SEQ ID NO:
 1. 13. A method for genotyping a bovine cell, comprising obtaining a nucleic acid sample from said cell and determining the identity of the nucleotide of a position of 10793 of the bovine POU1F1 gene according to claim
 12. 14. A method according to claim 13, wherein the bovine cell is an adult cell, an embryo cell, a sperm, an egg, a fertilized egg, or a zygote.
 15. A method according to claim 13, wherein the identity of the nucleotide is determined by sequencing the PI gene, or a relevant fragment thereof, isolated from the cell.
 16. A method according to claim 16, wherein the gene or a relevant fragment thereof is isolated from the cell via amplification by the polymerase chain reaction (PCR) of genomic DNA of the cell, or by RT-PCR of the mRNA of the cell.
 17. A method according to claim 15, wherein both copies of the gene in the cell are genotyped.
 18. A method for progeny testing of cattle, the method comprising collecting a nucleic acid sample from said progeny, and genotyping said nucleic sample according to claim
 13. 19-25. (canceled) 